open science
FLIP Reasoning Challenge
Plesner, Andreas, Kuzhagaliyev, Turlan, Wattenhofer, Roger
Over the past years, advances in artificial intelligence (AI) have demonstrated how AI can solve many perception and generation tasks, such as image classification and text writing, yet reasoning remains a challenge. This paper introduces the FLIP dataset, a benchmark for evaluating AI reasoning capabilities based on human verification tasks on the Idena blockchain. FLIP challenges present users with two orderings of 4 images, requiring them to identify the logically coherent one. By emphasizing sequential reasoning, visual storytelling, and common sense, FLIP provides a unique testbed for multimodal AI systems. Our experiments evaluate state-of-the-art models, leveraging both vision-language models (VLMs) and large language models (LLMs). Results reveal that even the best open-sourced and closed-sourced models achieve maximum accuracies of 75.5% and 77.9%, respectively, in zero-shot settings, compared to human performance of 95.3%. Captioning models aid reasoning models by providing text descriptions of images, yielding better results than when using the raw images directly, 69.6% vs. 75.2% for Gemini 1.5 Pro. Combining the predictions from 15 models in an ensemble increases the accuracy to 85.2%. These findings highlight the limitations of existing reasoning models and the need for robust multimodal benchmarks like FLIP. The full codebase and dataset will be available at https://github.com/aplesner/FLIP-Reasoning-Challenge.
Open Science and Artificial Intelligence for supporting the sustainability of the SRC Network: The espSRC case
Garrido, J., Sánchez-Expósito, S., Ruiz-Falcó, A., Ruedas, J., Mendoza, M. Á., Vázquez, V., Parra, M., Sánchez, J., Labadie, I., Darriba, L., Moldón, J., Rodriguez-Álvarez, M., Díaz, J., Verdes-Montenegro, L.
The SKA Observatory (SKAO), a landmark project in radio astronomy, seeks to address fundamental questions in astronomy. To process its immense data output, approximately 700 PB/year, a global network of SKA Regional Centres (SR-CNet) will provide the infrastructure, tools, computational power needed for scientific analysis and scientific support. The Spanish SRC (espSRC) focuses on ensuring the sustainability of this network by reducing its environmental impact, integrating green practices into data platforms, and developing Open Science technologies to enable reproducible research. This paper discusses and summarizes part of the research and development activities that the team is conducting to reduce the SRC energy consumption at the espSRC and SRCNet. The paper also discusses fundamental research on trusted repositories to support Open Science practices.
- Europe > Switzerland (0.04)
- Europe > Spain > Andalusia > Granada Province > Granada (0.04)
- Europe > Netherlands > North Holland > Haarlem (0.04)
- Europe > Montenegro (0.04)
- Information Technology > Artificial Intelligence (0.51)
- Information Technology > Data Science > Data Mining > Big Data (0.34)
Typhoon T1: An Open Thai Reasoning Model
Taveekitworachai, Pittawat, Manakul, Potsawee, Tharnpipitchai, Kasima, Pipatanakul, Kunat
This paper introduces Typhoon T1, an open effort to develop an open Thai reasoning model. A reasoning model is a relatively new type of generative model built on top of large language models (LLMs). A reasoning model generates a long chain of thought before arriving at a final answer, an approach found to improve performance on complex tasks. However, details on developing such a model are limited, especially for reasoning models that can generate traces in a low-resource language. Typhoon T1 presents an open effort that dives into the details of developing a reasoning model in a more cost-effective way by leveraging supervised fine-tuning using open datasets, instead of reinforcement learning. This paper shares the details about synthetic data generation and training, as well as our dataset and model weights. Additionally, we provide insights gained from developing a reasoning model that generalizes across domains and is capable of generating reasoning traces in a low-resource language, using Thai as an example. We hope this open effort provides a foundation for further research in this field.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > Middle East > Jordan (0.04)
- (7 more...)
AI for Open Science: A Multi-Agent Perspective for Ethically Translating Data to Knowledge
Yakaboski, Chase, Hyde, Gregory, Nyanhongo, Clement, Santos, Eugene Jr
AI for Science (AI4Science), particularly in the form of self-driving labs, has the potential to sideline human involvement and hinder scientific discovery within the broader community. While prior research has focused on ensuring the responsible deployment of AI applications, enhancing security, and ensuring interpretability, we also propose that promoting openness in AI4Science discoveries should be carefully considered. In this paper, we introduce the concept of AI for Open Science (AI4OS) as a multi-agent extension of AI4Science with the core principle of maximizing open knowledge translation throughout the scientific enterprise rather than a single organizational unit. We use the established principles of Knowledge Discovery and Data Mining (KDD) to formalize a language around AI4OS. We then discuss three principle stages of knowledge translation embedded in AI4Science systems and detail specific points where openness can be applied to yield an AI4OS alternative. Lastly, we formulate a theoretical metric to assess AI4OS with a supporting ethical argument highlighting its importance. Our goal is that by drawing attention to AI4OS we can ensure the natural consequence of AI4Science (e.g., self-driving labs) is a benefit not only for its developers but for society as a whole.
- North America > United States > New Hampshire > Grafton County > Hanover (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Austria > Styria > Graz (0.04)
The Rise of Open Science: Tracking the Evolution and Perceived Value of Data and Methods Link-Sharing Practices
Cao, Hancheng, Dodge, Jesse, Lo, Kyle, McFarland, Daniel A., Wang, Lucy Lu
In recent years, funding agencies and journals increasingly advocate for open science practices (e.g. data and method sharing) to improve the transparency, access, and reproducibility of science. However, quantifying these practices at scale has proven difficult. In this work, we leverage a large-scale dataset of 1.1M papers from arXiv that are representative of the fields of physics, math, and computer science to analyze the adoption of data and method link-sharing practices over time and their impact on article reception. To identify links to data and methods, we train a neural text classification model to automatically classify URL types based on contextual mentions in papers. We find evidence that the practice of link-sharing to methods and data is spreading as more papers include such URLs over time. Reproducibility efforts may also be spreading because the same links are being increasingly reused across papers (especially in computer science); and these links are increasingly concentrated within fewer web domains (e.g. Github) over time. Lastly, articles that share data and method links receive increased recognition in terms of citation count, with a stronger effect when the shared links are active (rather than defunct). Together, these findings demonstrate the increased spread and perceived value of data and method sharing practices in open science.
Investigating Reproducibility at Interspeech Conferences: A Longitudinal and Comparative Perspective
Arvan, Mohammad, Doğruöz, A. Seza, Parde, Natalie
Reproducibility is a key aspect for scientific advancement across disciplines, and reducing barriers for open science is a focus area for the theme of Interspeech 2023. Availability of source code is one of the indicators that facilitates reproducibility. However, less is known about the rates of reproducibility at Interspeech conferences in comparison to other conferences in the field. In order to fill this gap, we have surveyed 27,717 papers at seven conferences across speech and language processing disciplines. We find that despite having a close number of accepted papers to the other conferences, Interspeech has up to 40% less source code availability. In addition to reporting the difficulties we have encountered during our research, we also provide recommendations and possible directions to increase reproducibility for further studies.
- North America > United States > Maine > Kennebec County > Waterville (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > Alberta > Census Division No. 19 > Saddle Hills County (0.04)
- (5 more...)
Why diversity and inclusion needs to be at the forefront of future AI
Inês Hipólito is a highly accomplished researcher, recognized for her work in esteemed journals and contributions as a co-editor. She has received research awards including the prestigious Talent Grant from the University of Amsterdam in 2021. After her PhD, she held positions at the Berlin School of Mind and Brain and Humboldt-Universität zu Berlin. Currently, she is a permanent lecturer of the philosophy of AI at Macquarie University, focusing on cognitive development and the interplay between augmented cognition (AI) and the sociocultural environment. Neurourbanism as a Novel Approach in Global Health,' funded by the Berlin University Alliance.
EleutherAI: Going Beyond "Open Science" to "Science in the Open"
Phang, Jason, Bradley, Herbie, Gao, Leo, Castricato, Louis, Biderman, Stella
Over the past two years, EleutherAI has established itself as a radically novel initiative aimed at both promoting open-source research and conducting research in a transparent, openly accessible and collaborative manner. EleutherAI's approach to research goes beyond transparency: by doing research entirely in public, anyone in the world can observe and contribute at every stage. Our work has been received positively and has resulted in several high-impact projects in Natural Language Processing and other fields. In this paper, we describe our experience doing public-facing machine learning research, the benefits we believe this approach brings, and the pitfalls we have encountered.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Norway > Eastern Norway > Oslo (0.04)
naab: A ready-to-use plug-and-play corpus for Farsi
Sabouri, Sadra, Rahmati, Elnaz, Gooran, Soroush, Sameti, Hossein
Huge corpora of textual data are always known to be a crucial need for training deep models such as transformer-based ones. This issue is emerging more in lower resource languages - like Farsi. We propose naab, the biggest cleaned and ready-to-use open-source textual corpus in Farsi. It contains about 130GB of data, 250 million paragraphs, and 15 billion words. The project name is derived from the Farsi word NAAB K which means pure and high grade. We also provide the raw version of the corpus called naab-raw and an easy-to-use preprocessor that can be employed by those who wanted to make a customized corpus.
- Europe > Germany > Saxony > Leipzig (0.05)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.05)
- North America > Dominican Republic (0.04)
- (2 more...)
BLOOM Is the Most Important AI Model of the Decade
You may be wondering if such a bold headline is true. GPT-3 came out in 2020 and established a new road the whole AI industry has been following in intention and attention since. Tech companies have repeatedly built better, larger models, one after another. But although they've put millions into the task, none of them has fundamentally changed the leading paradigm or the game's rules GPT-3 laid out two years ago. Gopher, Chinchilla, and PaLM (arguably the current podium of large language models) are significantly better than GPT-3 but they are, in essence, more of the same thing.